{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0ace3d7e",
   "metadata": {},
   "source": [
    "# 1. Basic Format and Structure of the Submission File\n",
    "\n",
    "In the 1st phase, participants must submit a `.zip` file containing four `.npy` files. Each `.npy` file should store a binary causal matrix that reflects the causal graph specific to its dataset. It's important to note that within a causal matrix, the element at the `i-th` row and `j-th` column represents the presence (or absence) of a directed edge from alarm type (i.e. alarm_id) `i` to alarm type (i.e. alarm_id) `j`.\n",
    "\n",
    "# 2. Reference to Standard Submission File\n",
    "\n",
    "For a comprehensive example of the standard submission file, please refer to the following link: [NeurIPS CSL Competition Submission File](https://github.com/huawei-noah/trustworthyAI/blob/master/competition/NeurIPS2023/submission/submission.zip).\n",
    "\n",
    "# 3. Naming Rule of `.npy` Files\n",
    "\n",
    "The `.npy` file naming convention should follow this pattern: `{dataset name}_graph_matrix.npy`.\n",
    "\n",
    "For example:\n",
    "- The `.npy` file for dataset_1: `dataset_1_graph_matrix.npy`\n",
    "\n",
    "The subsequent script demonstrates a basic example of generating an `.npy` file for `dataset_1`. Similar steps can be followed for other datasets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "99b34125",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "afae3bde",
   "metadata": {},
   "source": [
    "#### 3.1 Load Relevant Files for Training in `dataset_1`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "0d2960e2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "shape of alarm data: (141853, 4)\n",
      "shape of causal prior matrix: (39, 39)\n"
     ]
    }
   ],
   "source": [
    "# alarm data\n",
    "alarms = pd.read_csv(r'./datasets/dataset_1/alarm.csv')\n",
    "# causal_prior\n",
    "causal_prior= np.load(r'./datasets/dataset_1/causal_prior.npy')\n",
    "\n",
    "\n",
    "print(f\"shape of alarm data: {alarms.shape}\")\n",
    "print(f\"shape of causal prior matrix: {causal_prior.shape}\")\n",
    "# Notes: topology.npy and rca_prior.csv are not used in this script."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43ade713",
   "metadata": {},
   "source": [
    "#### 3.2  Select `gCastle` as the Base Algorithm Library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "102fd26b",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2023-08-15 16:17:36,518 - /home/zhangkeli/anaconda3/lib/python3.9/site-packages/castle/backend/__init__.py[line:36] - INFO: You can use `os.environ['CASTLE_BACKEND'] = backend` to set the backend(`pytorch` or `mindspore`).\n",
      "2023-08-15 16:17:36,557 - /home/zhangkeli/anaconda3/lib/python3.9/site-packages/castle/algorithms/__init__.py[line:36] - INFO: You are using ``pytorch`` as the backend.\n"
     ]
    }
   ],
   "source": [
    "from castle.algorithms import PC\n",
    "from castle.common.priori_knowledge import PrioriKnowledge"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39010254",
   "metadata": {},
   "source": [
    "#### Illustrated by the `PC algorithm` with the ability to incorporate prior knowledge."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "0b2a4700",
   "metadata": {},
   "outputs": [],
   "source": [
    "TIME_WIN_SIZE  = 300"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "ec79d80e",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Use a time sliding window(300 seconds) to generate IID samples.\n",
    "alarms = alarms.sort_values(by='start_timestamp')\n",
    "alarms['win_id'] = alarms['start_timestamp'].map(lambda elem:int(elem/TIME_WIN_SIZE))\n",
    "\n",
    "samples=alarms.groupby(['alarm_id','win_id'])['start_timestamp'].count().unstack('alarm_id')\n",
    "samples = samples.dropna(how='all').fillna(0)\n",
    "samples = samples.sort_index(axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "2cba03a8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2017, 39)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "samples.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "85f0fb8a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# create the prior knowledge object for the PC algorithm \n",
    "prior_knowledge = PrioriKnowledge(causal_prior.shape[0])\n",
    "for i, j in zip(*np.where(causal_prior == 1)):\n",
    "    prior_knowledge.add_required_edge(i, j)\n",
    "\n",
    "for i, j in zip(*np.where(causal_prior == 0)):\n",
    "    prior_knowledge.add_forbidden_edge(i, j)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7018a600",
   "metadata": {},
   "source": [
    "#### 3.3 Obtain the Causal Graph Matrix and Save it as a Numpy Array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "aca0942f",
   "metadata": {},
   "outputs": [],
   "source": [
    "pc = PC(priori_knowledge=prior_knowledge)\n",
    "pc.learn(samples)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "dc4cec1b",
   "metadata": {},
   "outputs": [],
   "source": [
    "graph_matrix = np.array(pc.causal_matrix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "229146b9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(39, 39)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "graph_matrix.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "33816c86",
   "metadata": {},
   "outputs": [],
   "source": [
    "np.save(r'./submission/dataset_1_graph_matrix.npy',graph_matrix)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
